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This self-cont pcd end self -instructional unit is 
use by evaluation a r:d development personnel and by 
in introductory research aud evaluation courses. The unit, 
a discussion of the regression employing graphic 
illustrations with actual data. The .user is introduced to the 
regression effect in the single group pretest-posttests design, after 
which he responds to mastery test instructional exercises. The second 
papf illustrates how the regression effect confounds the matched-*pair 
type -of design ana this, too, is followed by mastery test 
instructional exercises. The user should be familiar with the basic 
statistical concepts of mean, standard deviation, correlation, and 
z-scores. (Author/SE) 
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Many research and evaluation studies yield misleading, erroneous, or misinterpreted 
finding due to the failure to recognize the regression and matched-groups fallacy. 
This module- is designed to develop the comoetencies needed to identify situations 
i in "which the regression effect confounds results. 



S. Stratagyt xke jeneral strategy selected for the solution of the problem above. 

The training materials present a conceptual, non-mathematical treatment of the 
regression phenomenon using graphical- illustration with actual data. Self- 
instructional exercises are included that can also be used as a pretest. 
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Jrodvcf Oticriptlom Describe the following; number each description. v 

t # * . Chaructcri? tics of the product. • 4* Associated products* if anj. 

m Z. Hoo it uorh>* • S. Special conditions, time, training, 

• j. Mhai it i* intend*! to do. ^i^tj^or oth^^irm^nu 

Characteristics of the Product : 

An 18 paqe discussion of the regression and matched-groups fallacy employing 
graphic illustrations with actual data. The module is self-contained and self- 
instructional. 5 , 

How it Works : 

The user is introduced to the regression effect in the single group pretest- 
posttest design, after which he responds to mastery test instructional exercises. 

The second part illustrates how the regression effect confounds the matched- 
pair type of design, followed by mastery test instructional exercises. 

» * 

What it is Intended to do : 

' 9 

Provide the user with recognition of situations in which the regression 
effect confounds results. 

( 

Requirements for Use : L \ 

User should be familiar with the basic statistical concepts of mean, standard 
deviation, correlation, and z-scores. , 



ERIC 



■ : 1 p 

10. Product U*o«» Those individuals or group* expected to use the product^. 

The product is^to be used by evaluators and consumers of research and evaluation 
reports as well as students In related courses. 



01 




r 



Twenty-eight users responding to questions pertaining to the instructional 
quality of the module, the error. rate for the programmed learning, and whether 
or not the materials were superflous (duolicated other equally-good sources). 

The results are given below: 

Instructional value?: "Poor; 1 : 0%; "fair": 7%; "good": 46%; "very good": 46?. 

Median error ra.te: 10%. • , 

Materials superflous: "Yes": 15%; "no": 85%. 

The rating of "good" or "very good" by 92% of the users suggests instructional 
value for this module. 
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12. l*ot«ftHoi Educational Con»«qu«nc«t: Discuss rot only th* theoretical (i.e. conceivable) 
implications of your product: but also the nore probable implications of your product, 
f especial I j over the next decade. 

1. Fewer- research and evaluation studies vulnerable to the confounding effects 
of regression. ** 

2. Recognition by consumers of research and evaluation reports af regression 
fallacy where jit exists and, hence, fewer mis intern re tat ions/of findings. 
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15. Start-vp Coit»t yota! arreted cost* to procure* 
install and initiate use of the product. 

Reproduction. costs only. 


10* Operating Cofttt Projected costs for continuing 
use of product after initial adoption and 
installation (i.e.*, fees, confutable supplies* 
speciat staff, training t etc.). 

Reproduction only. 
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17. Uk#ly Markofc* What is t)e likely market for this product? Consider the size and type of 
\ the utter group; number of possible substitute (competitor) products on the market; and 
\ the likely availability of funds to purchase product by '{for) the product user group. 

Evaluation and development personnel, especially those who are beinq trained 

i on the job at regional labs and at state departments of education. , 

j ■ • 

\ Students in introductory research and evaluation courses. 

s 

• • 

• 



erJc \ • 



INSTRUCTIONAL. MODULE ON . 
REGRESSION AND THE MATCHING FALLACY 
IN QUASI -EXPERIMENTAL RESEARCH 1 

Perhaps the most subtle source of invalidity in behavioral research is the 
elusive phenomenon of regression. Even seasoned researchers have frequently 
failed to detect Its presence; hence , it has spoiled many otherwise good 
research efforts. Studies of atypical and special groups have probably been 
the vie tiroes of the regression phenomenon more often than those in any other 
single area of inquiry. A simple statistical truism is that when subjects are 
selected because they deviate from the mean on some variable , regression will' 
always occur. 

Many studies on remediation and treatment of the handicapped and other 
deviant . groups follow this pattern: those tn greatest "need" are selected, a 

4* 

treatment is administered, and a reassessment then follows. For example, 

s * 

suppose all children having IQ scores below 80 were given some special treatment 
(e.g,, glutamic acid) over a periou of a year and were then retested. Assume 
that the time interval, etc., between testings was such that there was no prac- 
tice or carryover effect. If the treatment had absolutely no effect, how would 

the experimental group fare on the posttest? Figure 1 illustrates the regres- 
« 

slon effect using actual IQ score on 354 pupils tested iri grade five and three 
years later in grade -eight. The regression line shows the average IQ score at 
grade eight for any IQ score at grade five. For example, persons scoring 130 at 
grade five obtained an average score at grade eight of approximately 120. In 



Based on Journal of Special Education, (3), 329-336, 1969, by the- same 

author. 
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other words, on the average, the grade fiye 130's regressed about 10 goints to 

t ' 

120 at grade eight. Notice that there is a similar regression toward the mean 
for low-scorers y at grade five. Notice also that the scores are just as variable 
at grade..eight as they #ere at grade five — the example was selected so that 
the- regression effect would not be confounded with changes in means on the X and 
Y variable. There is a correlation of, .6 between pretest and posttest IQ scores 
in* this illustration. Figure 2 depicts a simplified illustrative situation. No 

« 

treatment or practi^fe effects* are present for the treated group; the means and 
variances are Identical in both distributions (as they are in most tests where 
standard scores are employed). Figure 2 illustrates that there is a definite 
and pronounced tendency for subjects to regress toward the posttest mean to the 
point where subjects tend to be, on the avera&., only six-tenths as far from the 
posttest mean as they we're on the pretest; l*e., on khe average, examinees tend 
%o deviate only 60 percent as much frcm the posttest \iean as they did from the 
pretest mean* Those examinees with pretest IQ scores of 80 would, on the 
average, be only 60 percent as far below the posttest mean — they would be 
expected to have an average posttest score of #8, a substantial ''gain" of 8 
points. Those having IQ scores of 70 initially would appear to have gained 12 
points, with a posttest mean of about 82. 



of posttest scores for persons having the same pretest score; in this example 



The standard error of estimate (s *sr v/ 1-r ) gives the standard deviation 

y .x y 



s *12 IQ points. Using the s we can accurately predict the proportion of 
y.x y»x, 

those with a given pretest score who will fall above (or below) any other IQ 
score on the posttest (provided the common assumptions of linearity and homogce- 
dasticity between the two variables are met). Those scoring 70 on the first 
test will have a mean of 82 on the second test, with a standard deviation of 12 
IQ points. Using a normal curve table, it is readily apparent that about 84 
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POSTTEST IQ 




J 



treated 
group 



PRETEST IQ. 



Figure 2. Graphic presentation' of a hypothetical situation in which a deviate group 
is selected and administered an inefficacious treatment* 
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percent will regress audience receive higher IQ scores on^the posttest even 
without any practice effect. One-half will "gain" 12 or-more IQ points; one- 
sixth will have IQs that "increased" b*y 24 or more points (i.e., obtain IQ 
scores of 96 or more). Further, about 10 percent of those with an initial IQ of 
70 will obtain an IQ score of 100 or more on the, second test, apart from any 
treatment or practice effect. Obviously, what may appear to an enthusiastic v 
Investigator to be striking Improvements in a deviant population can result 

■ 

r 

solely from the regression phenomenon. The following examples will serve to 
illustrate the problems: ' I 

Figure 3 is included to demonstrate that the regression effect 'is not 

/ 

simply a result of measurement error. Indeed, Galton first observed the phenom- 
enon in stature of father and sons, as illustrated In Figure 3 , and termed It 
the "law of filial regression." Note that tall fathers tend to have sons that 
are not as tall as they; short fathers tend to have sons that are not as short 
as they are — that is, they regress toward the mean. Notice a^lso that tall 
sons have fathers that are not as tall as they — regression occurs going from 
X to Y or from Y to X. 

INSTRUCTIONAL EXERCISES 

Assuming no practice or testing effect In the situation depicted in Figure 2 

1. The expected or average score on the posttest for persons scoring 110 on the 

* 

pretest is . 

106 

2. The average score on the posttest for persons scoring 90 on the pretest is 

• 

94 

3. If those scoring 90 on the pretest tend to score higher on the posttest, are 
they regressing ? ; 

(Yes, statistical regression Is movement toward the mean of group from which 
persons were selected.) 
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Height of Fathers 
, J = 68 5 X = 2.5 



Figure 3. Scatterplot showing the regression phenomenon (r = .56) in height of, 
192 fathers (X-variable) and sons (Y- variable^. The average height 
of sons is given for fathers in each column (Y^); the average height 

of fathers is given for sons in each row (X^). (Mote that in each 

instance (X to Y and Y to X) there is regression, although means and 
standard deviations are approximately equal for both X and Y.) 

Data from McNemar (1962, p. 117). 



/ 



\ 



/ 



Did both the 90 and 110 groups regress equally? 



Yea 



The expected posttest score for persons scoring 120 on the pretest would be 



112 

Did the "120s" regress more than the "110s?" 

* ■ ♦ 

(Yes, 8 points vs. A points.) 

Was the ratio below the same for. both the "110s" and "120s"? 

(expected deviation from posttest mean) or yj^ of Y* - Y 
(actual deviation f ro» pretest mean) x % x - X 

(Yes, 6/10 - .6, and 12/20 - .6.) 

the above- ratio is the coefficient of correlation when the standard 
deviations for the pretest and post tests are equal, i.e., cr - a, A more 

2% jr 

>\ 

r • -g 7 - where z , is the expected standard 2-scor^jm the posttest , and 



z Is the actual standard z-score on the pretest* 

Recall that z -scores are also called sigma scores because they express 

performance in standard deviation units. A z-score of +1.5 Indicates the 

score was 1.5 standard deviations above the , ^ 

mean 

.Suppose the x-variable in Figure 2 is unchanged, but the y- variable is a 

standardized grade-five reading test. Descriptive data for each are given 

below: (r - .6). 

xy 

10 Reading 
Means X « 100 Y - 5.0 (in grade equivalent) 



\ 

For persons with IQ scores of 132 on the IQ test (pretest), what is 

2 v 

expected ("most" probable) reading score? Since r - z' - rz^. A 

x * * | 

score of 132 in z-score units is ? 

(x - X - X - 32; z - £— « 32/16 - 2.0.) . 

« 

.10. Then, z* - rz - ( ) ( ) - 1.2. 
y x (.6). (2.0) 

11. To convert z* y to grade equivalent units, recall that the z' y * 1.2 

Indicates a performance 1.2 standard deviations above the mean, hence 

y» - z» dr or ( ) ( ) - 1.8. 
y y (1.2) (1.5) 

12. Hence, the expected reading score' for persons scoring 132 on the IQ test is 

1.8 grade equivalents above the mean or ( )+'( ) « 6.8. 

(5.0) (1.8) 

13. We have been illustrating the statistical basis for the regression effect. 

Conceptually we should understand that whenever subjects are selected 

because they are atypically low or high on some measure, on reassessment they 

will tend to u toward the mean. 

(regress) 

14. la the above example, was the percentile rank of the IQ score of 132 above 
the percentile rank of reading level? 

(Yes, 98Zile vs. 88Zlle.) 

15. One study compared the IQ scores of retarded mothers with corresponding IQ 
scores of their offspring who had been given a cognitive enrichment program. 
Will the children have significantly higher scores even if the enrichment 

Is without efficacy? 

(Yes) 

* 

Other actual examples follow. 



* Webb's (1963)- study reported that a group of Negro pupils in EMR classes 

* * ♦ 

had an average IQ of 68 on the WISC (on which they qualified for EMR classes) 
but obtained a mean IQ of 74 on the WAIS given two years later. The report 
concluded, "The most striking finding in this study is the significantly higher 
IQ's derived fijom the WAIS..." This reported increaae^easily falls within the 
range expected from regression alone. . 

Delacato (1959) reported large "gains" on a reading test for a group of 
pupils achieving at least 1.5 years below their "expectancy levels" who received 
Doman-Delacato therapy. Large apparent gains could have been predicted, since 
the regression phenomenon would have been operating strongly. 

Another study (Scott & Brlnkely, 1960) used the Minnesota Teacher Attitude 
Inventory and reported that student teachers M . ..working with supervising 
teachers whose attitudes toward pupils were* in each instance 9 superior to their 
own, Improved significantly, as f a group* in their attitudes toward pupils during 
student teaching . These results would be predicted from the regression 
phenomenon alone. 

Some researchers have mistakenly assumed that if a pretest, different from 
that on which a group was selected, is administered before the treatment, that 
the regression taking place on the second pretest completely eradicates the 
problem of post-treatment test regression. They incorrectly assume that post- 
test means can be meaningfully compared with the second pretest mean to assess 
possible treatment effects. However, other things being equal, tests admlnls- 
tered closely in time correlate more highly than those separated by a greater 
time interval. Hence, 1 greater regression would be expected from the first pre- 
test to posttest than from the first pretest to the second pretest. All of the 
regression thus is not eliminated by the use of a second pretest. Those scoring 

/ 

below the mean on the first pretest will score closer to the mean on the posttest 



10 

f ~ ' £ 

than they did on the second pretest in th# absence of any treatment or practice 
effect. For example* going b'ack t*o Figure 2 , suppose the group selected on the 
pretest was administered another pretest prior to the treatment. If the two 
pretests correlated .8, then, those with IQ scores of 80 on the first pretest 
would show an average IQ of about 84 on the second pretest, yet they would be 
expected to have a mean of 83 on the posttest without any treatment effect. 

Thus, in a pretest-posttest comparison, employment of a. second pretest, following 

f 

a pretest on which subjects are selected, does not eliminate all of the regres- 
si on artifacts. 

* 

.THE MATCHING FALLACY AND INTERNAL VALIDITY 

* * 

The regression effect probably goes unnoticed most often In studies using 
the matched-pair design. Consider the example given In Figure 4 in which cer- 
*bral\palsied persons were "matched" on IQ with normal persons. Obviously, the 
Intent was to have a CP vs. non-CP comparison on other variables, free from 
confounding resulting from Intelligence or IQ differences* Typically, the 
subjects paired together have IQs which fall within a narrow range (e.g., five 
points). Unfortunately, this procedure almost always results in a real differ- 
ence between the means of the groups, even on the variable on which they were 
"matched. " In most pairs, the pair-member from the population with a higher 
mean will have a higher score than his matched-pair from the control population. 

What if the investigator is aware of this problem and requires that the 
member of the cerebral palsied group, have the higher score in one-half of the 
pairs of subjects? Regrettably, a real and important difference between the 
groups on the matching variable will continue to result. (It may not be 
"statistically significant," however, if the sample size is small, since power 



would be low.) When the CP child has the higher score of the pair, the differ- 
ence between the paired IQs would tend to be less than when the normal member 
has the greater score . Figure 4 graphically illustrates this point: For 
normals with IQ scores of, for example, 90, almost two-thirds of those CPs 
having scores within five points of . this value will be below 90. 0. On the other 
hand, however, for CPs with IQs of 90, most of those normals within the matching 
range are above 90. 

Now consider the situation in which the researcher is aware of the above 

♦ 

problems and requires* identical scores on the matching variable, does he 
eliminate Che regression problem? Unfortunately note 

Suppose an investigator wanted to ascertain whether his creativity-Inducing 
treatment would be more effective with Negro pupils than with Anglo pupils. 
Assume that he required his matched pairs to have Identical pretest scores on 
Form 1 of the ABC creativity test* which was the selection instrument. The 
distribution of pretest scores for the total groups (from which the matched 
pairs were to be selected) is shown on the horizontal axis in Figure 5. 

| For simplicity • assume the standard I-score means were 40 and 60 for the 

i 

Negto and Anglo groups* respectively. The investigator then found fifty matched- 
pairs having equal scores on Fortu 1 of the ABC creativity test, who then became 
the members of his experimental and control groups. What would happen if he 
retested his sample with the parallel form of the ABC creativity test (a reli- 

i 

i 

ability coefficient of .60 is typical of such tests and is assumed here)? The 

i ^ * 
results are given in Figure 5. 

' The illustration 'shows that on the retest , the Anglo pupils would, on the 

average, be 8 T-scdre points (or .8 standard deviations) higher than their Negro 

matched-pairs who had an identical score on the initial test. For example, of 

the matched-pairs having a score of 40 on the first test, the Negro mean on the 




Figure 4 

Hypothetical IQ distributions of cerebral palsied and normal 
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second test would be 40, whereas the Anglo mean would be about 48. In other 
words 9 the Anglo mean will be much greater (3 points) on the retest than the 
Negro mean, due simply to regression effects. The scores of each sample have 
"regressed" toward the mean of their respective total groups. 

In many studies, of which the above example Is typical, the Investigator 
pretests, matches subjects.*, applies treatment, and then retests. He frequently 
concludes that the treatment was more effective for one group. This conclusion 

r 

is based upon Inadequate awareness of the regression that took place from test 
to test. We should note that in the example above we are observing only the 

regression phenomenon, not testing or maturation effects. 

Instructional Exercises 



16. Suppose high school male and female students were matched on height (within 

h inch) prior to being compared In some psychomotor skill. Has all of the 

effect of height been removed from the comparison? 

(No) 

17. Would the aysrage height of the matched-pair males be higher than their 

matched-pair females? 

(Yes) 

18. Why? 

(Since population means differ (see Figure 4), female pair-members would be 
more apt to be the shorter pair-member , l,e. t of all males within H inch of 
a typical female's height » perhaps two-thirds would be taller and only one- 
third shorter.) 

19 • If the research design required that the female was to be the taller pair- 
member in 50% of the pairings, would the average height of the males and 

females be expected to be equal? 

(No, the males would still have a higher mean.) 



\ 
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20. Return Co Figure 4* Suppose you are to find matched-pairs (t.5 IQ points) 
of normal persons for a group of C.P.'s. If a CP. had an IQ score of 90, 
what would the most probable or frequent qualifying IQ score (± 5 IQ points) 

, that you would find among the normal group be? (The height of the 

* ** • 

normal distribution curve indicates score frequency.) 

(95) 

21. But, to turn the illustration arouftd, suppose you are seeking from the CP* 
group a matched-pair for a normal pupil having an IQ score of 90. The most 
probable qualifying pair member from the CP. group would have an IQ score 
of . 

(85) 

22. In other words, regardless of whether one first has scores from the CP, 
group and then finds the matched-pair from the normal group or vice- versa, 
the most probable discrepancy, is 5 points favoring the group, 

(normal) 

23. If identical observed scores are required, would the mean observed scores 

be equal? < 

(Yes) 

24. But would the mean true scoria be equal? 

(No) 

25. If the group matched-pairs with identical scores were retested using a* 

parallel form, which would score higher? i 

(normals) 

26. Why? 

(The average of each group would tend to regress toward its population mean. 

Since most C.P.'s have IQ scores below 100, more than half of the normals 

among the matched pairs would have scored below the normals 1 mean. Upon 

retestlng, the means of the normal group woi^d have (increased or decreased)? 

(increased) 



/ 
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One recent study (Dobbs & Neville, 1967) matched 30 non-promoted pupils on 

« * > 

race, sex, age, MA, reading, and SES and concluded, "The promoted were better ■ 
after 2nd and 3rd years In both reading and arithmetic." These results could 
have been anticipated on the basis of the regression effect alone. 

. Would the use of gain scores avoid the difficulty? Unfortunately, the cor- 
relation of gain and Initial scores presents some statistical difficulties (cf . 

L 

McNemar, 1962) and is inefficient. If the pretest were used as a covaxlate in 
order to equate groups, would the problem be solved? Mo, the adjusted means 
would still differ without a treatment effect (eight points in the Anglo-Negro 
example) in spite of the fact that the original means of both groups remained 
unchanged . Lord (1967) has graphically illustrated this paradox. - 

Matching and External Validity 

One can easily see from the example given in Figure 3 that the matched-pair 
approach also seriously restricts the external validity of the findings when the 
"matched" subjects are drawn from populations having different means. The 
majority of the members of the Negro matched-pair sample discussed above would 
have had scores higher than the Negro group mean, whereas the Anglo sample would 
have represented below-average subjects from the Anglo population. 

Recommenda t ions 

Random assignment to treatment and non-treatment groups should be utilized 
whenever possible when working with non-organlsmic Independent variables (e.g., 
variables to which subjects can be randomly assigned). However, If a researcher 
Is comparing groups differing in organismic variables, e«g»» factors such as sex, 

ethnic group and IQ, which do not lend themselves to random assignment, the 

\ 

dependent variable should be residual gain scores, i.e.,, the difference between 
predicted scores and obtained scores on the posttest. (This may be difficult to 
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establish sloes , in ordsr to predict performance, data on a previous group is 
required.) Using this approach, in the present illustration, no differences 
would have been found between Anglo and Negro groups In residual gain scores. 
Additional technical discussions of this problem may be found In Harris (1963) 

m m j 

Stanley (1967), and Thorndifce (1942,1963). 
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